Electronic discovery computer program product

ABSTRACT

A system, apparatus, method, and computer program product for electronically stored file profiling and conversion including converting printable files to images, supported by meta-data, and one or more searchable master text files.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of and is based upon and claims thebenefit of priority under 35 U.S.C. §120 for U.S. Ser. No. 10/821,949,filed Apr. 12, 2004. This application contains subject matter related tothat disclosed in the following co-pending patent applications, thecontents of each of which are incorporated herein by reference: U.S.patent application Ser. No. 10/227,389 filed on Aug. 26, 2002; U.S.Patent Application Ser. No. 60/437,440 filed on Jan. 27, 2003; and U.S.Patent Application Ser. No. 60/461,895 filed on Apr. 11, 2003.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to systems, apparatuses, methods, and computerprogram products relating to profiling and processing of electronicallystored document data. More particularly, the invention relates to datathat may need to be produced by a party during a discovery phase oflitigation, where the processing includes converting printable files toimages, supported by meta-data, and one or more searchable text files.

2. Discussion of the Background

Computer-based discovery in legal proceedings is becoming more and morewidespread as tools providing cost effective and legally sound datadiscovery of electronic information are being developed. An overview ofcomputer-based discovery in federal civil litigation is provided in aFederal Courts Law Review article by Kenneth J. Withers, entitledComputer-Based Discovery in Civil Litigation and dated October 2000, theentire contents of which are incorporated herein by reference. Thisarticle notes how discovery is changing in response to the pervasive useof computers and how more and more cases involve e-mail, word processeddocuments and spreadsheets, and records of Internet activity. Thisarticle discusses some of the potential for computer-based discovery toreduce overall discovery costs and improve the administration ofjustice. The article also explores the unique problems of computer-baseddiscovery. The appendix to this article provides a checklist of computerbased discovery considerations regarding pretrial conferences under U.S.Federal Civil Procedure Rule 16(c).

In conducting computer-based discovery, problems arise with respect tothe vast quantities of electronic documents that must be reviewed,whether for a party's document production in a litigation againstanother party, for conducting an internal investigation, or forsatisfying government reporting requirements. A party's ability tomanage each matter that can be mission critical depends on how fast itcan capture, identify, review, assess, and produce relevant documents.The volume of electronic documents today far exceeds paper documents.

According to a 2000 University of California study by Lyan, P. andVatian, H., entitled “How Much Information,”(http://info.berkley.edu/how-much-info/) the entire contents of whichare hereby incorporated by reference, over 90% of corporate documentsare created electronically and an estimated 70% of those are neverprinted to paper. Additionally, e-mail communication among U.S.employees is approaching 3 billion a day. This has dramaticallyincreased the volume, complexity, and cost of electronic documentdiscovery. Moreover, emailing-employees (custodians) often have multipledata sets contained in multiple messaging systems. Electronic documents,whether e-mail stored on hard drives, backup tapes, etc. come innumerous file types (e.g., MICROSOFT WORD, COREL WORD PERFECT, MICROSOFTEXCEL, LOTUS 123, MICROSOFT OUTLOOK, SYMANTEC ACT, AND MICROSOFTOUTLOOK) as well as numerous versions. These documents are often timesencoded and may be virus infected. Often a party is required to producethese vast amounts of electronic documents in paper form, a process thatcan be unjustifiably expensive without telescoping the retrieval ofdocuments based on relevant issues.

FIG. 1 is a flow chart that illustrates the electronic document legaldiscovery process common today. This conventional process begins in stepS1 with accessing one or more data archives, followed by searching andfiltering these archives in step S2 in order to identify documents thatmay be of interest, and printing these selected files in step S3. Insome conventional systems, files of interest are not first converted toimages before printing. Typically, the searching and filtering isrestricted to parameters such as file-owner, date, destination, or otherhigh-level file meta-data. These files are typically not searched orfiltered by size, content for duplication, versions,encryption/encoding, corruption, or viruses. Typically, files printed orconverted to images via this process are manually reviewed (at greatexpense) for relevancy, redundancy, and readability.

As noted previously, many of the printed documents are eventually foundto be redundant, encoded, or somehow corrupted and thus illegible.Furthermore, conventional search and filtering processes are rudimentaryand result in documents being printed that are not relevant to the legaldiscovery process. The costs of printing can be exorbitant and costs aregreatly increased when review time of legal staff at high hourly ratesis added. What is desired, as recognized by the present inventors, is away to electronically screen, select, archive, search, retrieve, andview documents that are relevant to the legal discovery process whilenot incurring the large expense of having to convert to images and/orprint unwieldy and largely useless and/or redundant materials that haveto be reviewed in an inefficient, costly, manual manner.

In addition, conventional systems require the entire contents of anarchive to be copied and sent to a remote facility for theabove-described conventional file processing of FIG. 1. Thus, theinventors have also recognized economic advantages, operationalefficiencies, and enhanced privacy/security associated with having anautomated tool that (a) can be hosted at the facility in which thearchives are located and (b) can be operated by the people knowledgeableabout the content in the local archives.

In addition, conventional systems are limited by their reliance on thefile extensions to identify file type (e.g., .doc, .wpd, .pdf). Since anauthor can change/create a file type, the file extension is not alwaysan accurate identifier of the file type. What is desired is a way toidentify file type without only relying on the file extensionidentifier. Also, once the file type is identified, conventional systemsare often characterized as having a single, predetermined method ofviewing the text associated with the file. Furthermore, no conventionalsystems are known to be able to quickly convert a file to an |image|,let alone to a plurality of proprietary image file |types|.

Conventional systems include Daticon's Discovery OnDemand, MerrillCorporation's Discovery Navigator, LSI's Electronicode, Doculex'sDiscovery Cracker, Pacific Legal's Discover-e Web Respository Solution,Bowne's CaseSoft, Mobious' HardCopy Pro Plus and EDD Workstation, ImageCapture Engineering's Z-Print, and Applied Discovery's online reviewproduct.

In addition, conventional systems are constrained by not being able tosimultaneously conduct a text-based search and a structured-data query(e.g., SQL). This slows the process of electronic discovery and searchresults assimilation. What is also desired, as discovered by the presentinventors is a tool that allows for simultaneous text-based andstructured-data searching, data integration, and archiving.

SUMMARY OF THE INVENTION

The present invention addresses and resolves the above identified, aswell as other limitations, with conventional electronic file review andlegal discovery systems and methods. The present invention provides asite-hostable, easy-to-implement infrastructure and technology forelectronic document discovery. The present invention includes asoftware-based data profiler tool and/or hardware that enables users toeffectively support electronic document discovery.

In the present invention, the software-based data profiler tool accessesdata stored in a computer readable medium and then:

-   -   (1) allows users to search files in an electronic archive based        on pre-determined content information and/or metadata and then        to drag-and-drop selected files into an electronic profiling        folder;    -   (2) identifies the files within the electronic folder that can        be printed and/or converted for downstream visualization,        content searching, and meta-data searching;    -   (3) identifies duplicate files and/or documents that can be        eliminated from the electronic folder;    -   (4) (optionally) identifies corrupted files that can be exported        for further processing;    -   (5) (optionally) identifies, cleans, and/or deletes and/or        exports virus infected files/documents from the electronic        folder;    -   (6) (optionally) identifies, decodes/decrypts, and/or deletes        and/or exports encoded/encrypted files/documents from the        electronic folder;    -   (7) creates an image of selected files in the electronic folder        and appends meta-data associated with each file/document        converted to the image;    -   (8) (optionally) time-stamps and digitally authenticates each        non-editable image and associated meta-data to protect against        future manipulation or destruction;    -   (9) exports each image and associated meta-data to an image        viewer, and/or a printer, and/or a computer configured to search        the image's meta-data, and/or normalizes the files to a degree        by making them all fit a predetermined (e.g., 8.5″×11″ letter        sized) format, irrespective of the original document's size        (e.g., a spreadsheet);    -   (10) creates one or more master text files, to include        associated meta-data, containing the contents of one or more        files from the electronic folder;    -   (11) (optionally) time-stamps and digitally authenticates the        one or more master text files to protect against future        manipulation or destruction; and    -   (12) exports the one or more text files containing the contents        of some or all of the selected files, along with associated        meta-data, to an image viewer, and/or a printer, and/or a        computer configured to search the contents of the text file(s)        and/or the meta-data of the text file(s).

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete appreciation of the present invention and many of theattendant advantages thereof will be readily obtained as the samebecomes better understood by reference to the following detaileddescriptions and accompanying drawings:

FIG. 1 is a flow diagram of a conventional method of selecting files toprint as part of a litigation discovery process;

FIG. 2 is a high-level flow diagram of a method of electronic documentdata profiling of the present invention;

FIG. 3 is a detailed flow diagram of a method of electronic documentdata profiling of the present invention;

FIG. 4 is a block diagram of the present invention;

FIG. 5 is block diagram of another embodiment of the present invention;

FIG. 6 is a flow chart of another embodiment of the present invention;and

FIG. 7 is a block diagram of a computer used with the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The following comments relate to the drawings, wherein like referencenumerals designate identical or corresponding parts throughout theseveral views.

FIG. 2 illustrates an overview of a method employed by the presentinvention. Data is accessed in step S21. The date may be located in oneor more databases or on more or more computers or other data archives.Files from these archives are manually or automatically transferred to aworking electronic folder S23 for file processing S25. In oneembodiment, transfer of files to the working electronic folder is via atailorable, drag-and-drop user interface that may include using acomputer mouse and/or other pointing device. The working electronicfolder is tagged with meta-data including date created, last dateopened, last date modified, creator name, matter name, and otheridentification and quality control data. Optionally, the workingelectronic folder may include a time-stamped audit file for recording acomplete file history from file creation to file destruction.

File processing S25 includes checking for duplications, (optionally)viruses, (optionally) encoding and/or encryption. Optionally, pageestimation and time stamping/digital authentication is also performed.Files that are duplicates are identified by a hash or other uniqueidentifier (e.g., an email message ID). Files that cannot be processedare marked as exception files. Exception files may be those with avirus, or files that are encrypted, files that are corrupted, or filesthat are of an unknown or deselected file type. Files that requirespecial processing and/or conversion may be exported for specialprocessing in step S200. Files marked as exception files are logged andmay also be exported.

FIG. 3 illustrates details about the file processing of step S25. Theinclusion of many of the following substeps varies with embodiment asdoes the ordering of the following substeps

Files are then sent to a duplication identification process in stepS303. In one embodiment, file duplication is determined by the MD5 hashalgorithm developed by Professor Ronald L. Rivest of MIT.

De-duplicated files are checked for file corruptions in step S305.Corrupted files are either deleted or are exported for furtherprocessing S200.

Optionally, duplication-checked files are also subjected to a viruschecking process in step S305. In one embodiment, virus checking isperformed with a Perl File Scan Module supported by Amavis andMime-defang.

Optionally, duplication checked files may be sent to an encoding andencryption identification process in step S307. Encoded/encrypted filesare either deleted or are exported for further processing S200.

Optionally, files are then sent for time stamping/digital authenticationand (optionally) a page estimation in step S309. In one embodiment, pageestimation is based on actual page count. In another embodiment, pageestimation is determined by a bytes-to-pages ratio which varies per filetype. In another embodiment actual pages are read from file headers. Atany time during this process, summary statistics can be stored,visualized, and/or printed.

After file processing S25, selected files are converted in step S27. Thefile conversion of step S27 includes

-   -   extracting predetermined meta-data from each selected documents        into a document-specific file of meta-data (e.g., an ASCII file        of extracted meta-data);    -   creating an image of each selected file and appending the        document-specific file of extracted meta-data;    -   (optionally) time stamping/digitally authenticating both the        image and the file of extracted meta-data;    -   (optionally) creating a searchable master text file (e.g., .txt,        .doc, .rtf, .wp, etc.) containing the contents of all the        selected files, and time stamping/digitally authenticating the        master text file, and appending selected meta-data about the        files included in the master text file;    -   (optionally) creating one or more searchable subordinate text        files (e.g., .txt, .doc, .rtf, .wp, etc.) containing the        contents of an operator-selected subset of all the selected        files, and time stamping/digitally authenticating the        subordinate text files, and appending selected meta-data about        the files included in the subordinate text files.

The meta-data extracted from the selected documents may relate to filecontent (e.g., litigation name, party name, etc.); content headerinformation (e.g., privileged, confidential, etc.), file meta-data(e.g., author, recipient, date, etc.), file type (text document,presentation document, spreadsheet, etc.), or other criteria identifiedby the user (e.g., page count, individual keyword count, multiplekeyword count, etc.). Time may correspond to UTC time and/or anotherpredetermined time zone. The metadata file may be an ASCII file.

As noted above, profiling may also include extracting the text from thefile(s) into an accompanying text file for later searching andfiltering. Files that are images of text (e.g., image-only .pdf files)optionally may be converted to text with an OCR program. Either bypredetermination or by selection, profiling may include either or bothof the steps of compiling the metadata and extracting the text. The textof the entire file may be extracted. Alternatively, portions of the textmay be selected with a mouse-like device for extraction. In addition,key words may be searched for in the document. Then, text around thatkeyword may be selected for extraction.

In one embodiment, the file processing includes using a prioritizedplug-in module with the prioritization scheme keyed to the file type.The plug-ins may be selected to be ‘ON’ or ‘OFF.’ The plug-ins can alsohave differing priorities so that files not recognized by extension canbe filtered first to the plug-ins most likely to recognize the file.File type is determined by both identifying the file type extension andevaluating the binary file header. The file type identification step mayfirst consider the extension. When the extension is unknown, the binaryheader is evaluated. Alternatively, the binary header may be firstconsidered. If there is a conflict between the header and the extension,the header or the extension may be considered a default first choice,either arbitrarily or based on a predetermined logic keyed to suggestedfile type.

Once a file type is suggested, the highest priority plug-in is used toread or otherwise view the text. For example, if file type A issuggested, the prioritized list of preferred plug-ins may be “1”, “5”,and “3”. Some plug-ins may be able to open multiple file types. Also,multiple plug-ins may be able to open a single file-type. A plug-in maybe a created by incorporating libraries of commercially availablesoftware products or a plug-in may be developing unique libraries ofprogramming code that incorporates the functionality of a third partylibrary or application to load, image and extract metadata from adocument. Examples of plug-ins include the AdobeAcrobat, AutoVue,Fallback, GnuZip, HTML, Lotus Notes, Microsoft Access, Microsoft Excel,Outlook, MSG, Microsoft PowerPoint, Microsoft Word, Tiff, Tar, and Zipplug-ins . . . .

Files that are not correlated to a particular plug-in, or files thatcannot be read by the suggested plug-ins, may instead be read by theAutoVue plug-|in|.

Files that cannot be processed by the AutoVue plug-in may then beprocessed by using Microsoft Windows File Type Associations and theFallback plug-in which may require installation of additional softwareapplications. The Fallback plug-in is disabled by default. It is used toattempt to profile unknown documents. It does so by accessing thewindows registry to determine if a “print” verb is associated with theextension in windows. If a “print” verb is found to be associated withthe extension we start a new windows process with that verb as startupinfo and feed the output to our imaging print driver. The goal is to getimages and text (but not metadata as that requires lower level fileaccess) from files that we have not developed a specialized plug-in forbut for which third party applications do reside on the user's machinefor manipulating.

Some identifiable file types may be designated as ‘not-to-be processed’files. An executable files is an example of a ‘not-to-be processed’file. However, some executable files may be processed since they may beassumed to contain text data (e.g., self-extracting .zip files). Ingeneral, ‘not-to-be-processed’ files consist of non-printable files.

Processed files may be searched for keywords or key metadata. Thosefiles containing the searched item or parameter are automaticallyselected for export. The operator may view and deselect any file priorto export. |The| operator may also add files that do not contain thesearched item or parameter.

Files are then imaged by an imaging module, such as a TIFFing driver.The format of the image file is user-|selectable| from a pre-determinedset of document image formats (e.g., tiff, gif, .pdf.) Preferably, theimaging module will be capable of rapid imaging. Additionally, theimaging module will be tailorable. An example of a fast TIFFing driveris the Microsoft Office Document Imaging 2003 (MODI) driver.

The images and/or master/subordinate text file(s) are exported (andoptionally are printed) in step S29 to an image viewer, and/or aprinter, and/or a computer configured to search the correspondingmeta-data and/or the master file's textual content. The exported datawill include the images (e.g., TIFF) and may include the above-describedmeta-data file and/or extracted text file. The format of the exportedfile may be a proprietary litigation support software file type such asthe IPRO Tech, Inc. (www.iprocorp.com) lfp. file type. Other specialtyfile types may include file types associated with litigation supportsoftware from Opticon, Concordance, Summation, and Ringtail. Also, acommercial data management file type may be used (e.g., MicrosoftAccess).

Prior to export, the files may be searched and filtered against storedor user-entered search and filtering criteria (or criteria selected by auser). Search criteria may be based on file content (e.g., litigationname, party name, etc.); content header information (e.g., privileged,confidential, etc.), file meta-data (e.g., author, recipient, date,etc.), file type (text document, presentation document, spreadsheet,etc.) or other criteria identified by the user. Standard filteringcriteria may be saved for future editing and/or queries. Additionally,once received, the exported files may again be searched and filtered.

In an alternative embodiment, files in the original databases may bepre-filtered following the database access step S21 and preceding theinitial file selection step S23. The pre-filtering criteria may bepredetermined or user-entered. The pre-filtering criteria may be basedon file content (e.g., litigation name, party name, etc.); contentheader information (e.g., privileged, confidential, etc.), filemeta-data (e.g., author, recipient, date, etc.), file type (textdocument, presentation document, spreadsheet, etc.) or other criteriaidentified by the user. Standard pre-filtering criteria may be saved forfuture editing and/or |queries|.

In another embodiment, the processes of FIG. 2 can be integrated withthe email and instant messaging archive processing process described inco-pending Application Ser. No. 60/437,440 filed on Jan. 27, 2003. Inthis embodiment, both the email/instant message files and theirprintable attachments are processed as described previously.

A sample set of results from the process of FIGS. 2 and 3 is found inTables 1 and 2 below. The “extension types” is an example of one of thepredetermined search and filter criteria discussed above.

TABLE 1 Sample Detail Report Extension Total Estimated Types VirusesDuplicates Files Pages BAK 0 0 1 0 bmp 0 0 1 1 com 0 0 1 0com-access_log 0 0 1 0 com-error_log 0 0 1 0 doc 0 0 3 3 eps 0 0 1 0 gif0 1 22 300 html 0 0 19 19 jbf 0 0 2 0 jpg 0 4 46 46 ori 0 0 1 0 pl 0 0 10 png 0 1 41 0 psd 0 2 15 0 psp 0 0 17 0 TIF 0 4 9 0 tmp 0 0 1 0 txt 0 03 3 unknown 0 33 2 0 wmv 0 0 3 0

TABLE 2 Sample Summary Report Total Files Transferred In: 600 TotalDuplicates: 45 Total Files Encoded: 191 Total Files Exported forProcessing 23 Total Files Converted to Image 321 Total EstimatedPrintable Pages: 472

In one embodiment of the present invention, the software is configuredin accordance with a ‘plug-in’ architecture that allows foraccount-based reconfiguration of features; self-installing, externallydelivered upgrades (e.g., via the Internet); and user-ID drivenlicense/account management.

FIG. 4 illustrates the overarching system architecture of the presentinvention. The legal discovery tool 41 accesses one or more electronicfile archives 42 via an interconnection media 43. The interconnectionmedia 43 is preferably a local area network but may also be via wirelessor direct storage media access. The electronic archives 42 may be of anycommercial or proprietary structure (e.g., SQL, HTML, flat files,object-oriented) and content (e.g., documents, email, annotated images,annotated audio/video, etc.). The legal discovery engine 44 performs afiltering and selection operation with pre-stored and/or operatorentered criteria 45. These criteria may include author name, filecreation date, title, keyword, or other readily available meta-data. Theresults of the legal discovery process are stored in a separaterepository 46. Files that pass the filtering process are then passedonward for file processing and conversion. Alternatively, files ofinterest are selected via a drag-and-drop or comparable process and thenpassed onward for file processing and conversion. Files that requirespecial processing may be exported via multiple methods to a specialprocessing infrastructure 47. At any time, files or statistical resultsof the legal discovery process may be sent to a printer 48 for printingvia the interconnection media 43.

FIGS. 5 and 6 are a block diagram and a flow chart corresponding toanother embodiment of the present invention in which a commerciallyavailable enterprise content management engine is adapted to carry outsome of the novel features of the present invention. The commerciallyavailable content management enterprise is specifically adapted for theprocesses shown, to include enabling simultaneous text and structureddata searching of electronic archives.

FIG. 7 illustrates an example basic computer block diagram used inassociation with this invention. FIG. 7 illustrates a computer system1201 upon which an embodiment of the present invention may beimplemented. The computer system 1201 includes a bus 1202 or othercommunication mechanism for communicating information, and a processor1203 coupled with the bus 1202 for processing the information. Thecomputer system 1201 also includes a main memory 1204, such as a randomaccess memory (RAM) or other dynamic storage device (e.g., dynamic RAM(DRAM), static RAM (SRAM), and synchronous DRAM (SDRAM)), coupled to thebus 1202 for storing information and instructions to be executed byprocessor 1203. In addition, the main memory 1204 may be used forstoring temporary variables or other intermediate information during theexecution of instructions by the processor 1203. The computer system1201 further includes a read only memory (ROM) 1205 or other staticstorage device (e.g., programmable ROM (PROM), erasable PROM (EPROM),and electrically erasable PROM (EEPROM)) coupled to the bus 1202 forstoring static information and instructions for the processor 1203.

The computer system 1201 also includes a disk controller 1206 coupled tothe bus 1202 to control one or more storage devices for storinginformation and instructions, such as a magnetic hard disk 1207, and aremovable media drive 1208 (e.g., floppy disk drive, read-only compactdisc drive, read/write compact disc drive, compact disc jukebox, tapedrive, and removable magneto-optical drive). The storage devices may beadded to the computer system 1201 using an appropriate device interface(e.g., small computer system interface (SCSI), integrated deviceelectronics (IDE), enhanced-IDE (E-IDE), direct memory access (DMA), orultra-DMA).

The computer system 1201 may also include special purpose logic devices(e.g., application specific integrated circuits (ASICs)) or configurablelogic devices (e.g., simple programmable logic devices (SPLDs), complexprogrammable logic devices (CPLDs), and field programmable gate arrays(FPGAs)).

The computer system 1201 may also include a display controller 1209coupled to the bus 1202 to control a display 1210, such as a cathode raytube (CRT), for displaying information to a computer user. The computersystem includes input devices, such as a keyboard 1211 and a pointingdevice 1212, for interacting with a computer user and providinginformation to the processor 1203. The pointing device 1212, forexample, may be a mouse, a trackball, or a pointing stick forcommunicating direction information and command selections to theprocessor 1203 and for controlling cursor movement on the display 1210.In addition, a printer may provide printed listings of data storedand/or generated by the computer system 1201.

The computer system 1201 performs a portion or all of the processingsteps of the invention in response to the processor 1203 executing oneor more sequences of one or more instructions contained in a memory,such as the main memory 1204. Such instructions may be read into themain memory 1204 from another computer readable medium, such as a harddisk 1207 or a removable media drive 1208. One or more processors in amulti-processing arrangement may also be employed to execute thesequences of instructions contained in main memory 1204. In alternativeembodiments, hard-wired circuitry may be used in place of or incombination with software instructions. Thus, embodiments are notlimited to any specific combination of hardware circuitry and software.

As stated above, the computer system 1201 includes at least one computerreadable medium or memory for holding instructions programmed accordingto the teachings of the invention and for containing data structures,tables, records, or other data described herein. Examples of computerreadable media are compact discs, hard disks, floppy disks, tape,magneto-optical disks, PROMs (EPROM, EEPROM, flash EPROM), DRAM, SRAM,SDRAM, or any other magnetic medium, compact discs (e.g., CD-ROM), orany other optical medium, punch cards, paper tape, or other physicalmedium with patterns of holes, a carrier wave (described below), or anyother medium from which a computer can read.

Stored on any one or on a combination of computer readable media, thepresent invention includes software for controlling the computer system1201, for driving a device or devices for implementing the invention,and for enabling the computer system 1201 to interact with a human user(e.g., print production personnel). Such software may include, but isnot limited to, device drivers, operating systems, development tools,and applications software. Such computer readable media further includesthe computer program product of the present invention for performing allor a portion (if processing is distributed) of the processing performedin implementing the invention.

The computer code devices of the present invention may be anyinterpretable or executable code mechanism, including but not limited toscripts, interpretable programs, dynamic link libraries (DLLs), Javaclasses, and complete executable programs. Moreover, parts of theprocessing of the present invention may be distributed for betterperformance, reliability, and/or cost.

The term “computer readable medium” as used herein refers to any mediumthat participates in providing instructions to the processor 1203 forexecution. A computer readable medium may take many forms, including butnot limited to, non-volatile media, volatile media, and transmissionmedia. Non-volatile media includes, for example, optical, magneticdisks, and magneto-optical disks, such as the hard disk 1207 or theremovable media drive 1208. Volatile media includes dynamic memory, suchas the main memory 1204. Transmission media includes coaxial cables,copper wire and fiber optics, including the wires that make up the bus1202. Transmission media also may also take the form of acoustic orlight waves, such as those generated during radio wave and infrared datacommunications.

Various forms of computer readable media may be involved in carrying outone or more sequences of one or more instructions to processor 1203 forexecution. For example, the instructions may initially be carried on amagnetic disk of a remote computer. The remote computer can load theinstructions for implementing all or a portion of the present inventionremotely into a dynamic memory and send the instructions over atelephone line using a modem. A modem local to the computer system 1201may receive the data on the telephone line and use an infraredtransmitter to convert the data to an infrared signal. An infrareddetector coupled to the bus 1202 can receive the data carried in theinfrared signal and place the data on the bus 1202. The bus 1202 carriesthe data to the main memory 1204, from which the processor 1203retrieves and executes the instructions. The instructions received bythe main memory 1204 may optionally be stored on storage device 1207 or1208 either before or after execution by processor 1203.

The computer system 1201 also includes a communication interface 1213coupled to the bus 1202. The communication interface 1213 provides atwo-way data communication coupling to a network link 1214 that isconnected to, for example, a local area network (LAN) 1215, or toanother communications network 1216 such as the Internet. For example,the communication interface 1213 may be a network interface card toattach to any packet switched LAN. As another example, the communicationinterface 1213 may be an asymmetrical digital subscriber line (ADSL)card, an integrated services digital network (ISDN) card or a modem toprovide a data communication connection to a corresponding type ofcommunications line. Wireless links may also be implemented. In any suchimplementation, the communication interface 1213 sends and receiveselectrical, electromagnetic or optical signals that carry digital datastreams representing various types of information.

The network link 1214 typically provides data communication through oneor more networks to other data devices. For example, the network link1214 may provide a connection to another computer through a localnetwork 1215 (e.g., a LAN) or through equipment operated by a serviceprovider, which provides communication services through a communicationsnetwork 1216. The local network 1214 and the communications network 1216use, for example, electrical, electromagnetic, or optical signals thatcarry digital data streams, and the associated physical layer (e.g., CAT5 cable, coaxial cable, optical fiber, etc). The signals through thevarious networks and the signals on the network link 1214 and throughthe communication interface 1213, which carry the digital data to andfrom the computer system 1201 maybe implemented in baseband signals, orcarrier wave based signals. The baseband signals convey the digital dataas unmodulated electrical pulses that are descriptive of a stream ofdigital data bits, where the term “bits” is to be construed broadly tomean symbol, where each symbol conveys at least one or more informationbits. The digital data may also be used to modulate a carrier wave, suchas with amplitude, phase and/or frequency shift keyed signals that arepropagated over a conductive media, or transmitted as electromagneticwaves through a propagation medium. Thus, the digital data may be sentas unmodulated baseband data through a “wired” communication channeland/or sent within a predetermined frequency band, different thanbaseband, by modulating a carrier wave. The computer system 1201 cantransmit and receive data, including program code, through thenetwork(s) 1215 and 1216, the network link 1214, and the communicationinterface 1213. Moreover, the network link 1214 may provide a connectionthrough a LAN 1215 to a mobile device 1217 such as a personal digitalassistant (PDA) laptop computer, or cellular telephone.

The present invention includes a user-friendly interface that allowsindividuals of varying skill levels to search numerous digital mediaarchives and archive types as well as allows users to drag-and-dropselected files for one or more of the previously described processingsteps. The user interface also allows users to design products and printstatistical reports about information stored within these archives. Theinterface allows users to optionally enable virus checking and duplicatechecking as well as to determine and display the file types, number offiles, and estimate number printed pages of printable files. Theinterface also allows individuals to easily identify and tag duplicates,infected files, and encoded and encrypted files. The interface alsoallows individuals to create a time stamp for digital authentication foreach file processed. The present invention allows for such files to besent to another device for further processing.

In one embodiment, the computer is configured as |follows|.

-   -   Shuttle SB52G2 XPC with an optional SCSI Interface or 2nd        Rocketdrive    -   3.06 Ghz Pentium 4 Processor w/ Hyperthreading Technology & 533        Mhz Frontside Bus    -   200 GB Storage Capacity w/ 8 MB Cache    -   6 GB Total System Memory    -   DVD+-RW Drive, capable of Reading/Writing/Re-Writing to DVD/CD        Media    -   6 USB 2.0 Ports    -   10/100 and 10/100/1000 LAN Interfaces    -   Floppy Disk Drive    -   Keyboard/Mouse    -   Carrying Case    -   17″ Flat Panel Monitor    -   Microsoft Windows XP Professional SP1    -   Microsoft Office XP Professional

In another embodiment, the computer is configured as follows:

-   -   SB51G XPC    -   3.06 Ghz Pentium 4 Processor w/ Hyperthreading Technology & 533        Mhz Frontside Bus    -   200 GB Storage Capacity w/ 8 MB Cache    -   6 GB Total System Memory    -   DVD+-RW Drive, capable of Reading/Writing/Re-Writing to DVD/CD        Media    -   6 USB 2.0 Ports    -   2 IEEE 1394 Firewire Ports    -   10/100 LAN Interface    -   Floppy Disk Drive    -   Keyboard/Mouse    -   Carrying Case    -   17″ Flat Panel Monitor    -   Microsoft Windows XP Professional SP1    -   Microsoft Office XP Professional

The present invention also includes software and computer programsdesigned to enable electronic file import/profiling/conversion asdescribed previously.

Numerous modifications and variations of the present invention arepossible in light of the above teachings. It is therefore to beunderstood that within the scope of the appended claims, the inventionmay be practiced otherwise than as specifically described herein.

1. A non-transitory computer readable storage medium includinginstructions that when executed by a computer cause the computer toprocess and convert electronically-stored data for electronic discoveryin support of litigation according to a method comprising steps of:copying data and associated meta-data into a processor-based device fromone or more data storage devices physically loaded by a user at auser-site located at the user-site, said processor-based device beinglocated at said user-site; receiving user input from the user to saidprocessor-based device for subsequent processing at the user-site of aworking copy of the data and associated meta-data, said subsequentprocessing including storing in a local storage device said data andassociated meta-data as the working copy for file processing, whilemaintaining a document context with respect to other documents, saidworking copy of the data including a plurality of files; said fileprocessing of the plurality of files in the working copy of the dataincluding: identifying a duplication of one or more files within theworking copy of the data, including at least one of identifying aduplication by a hash algorithm, checking a duplicate file for filecorruption, and exporting a duplicated file, and converting a selectedfile to an output file in a user-specified format with said computer ifsaid selected file is convertible to the user-specified format, saidstep of converting including extracting and saving file meta-dataassociated with said selected file, extracting text from said selectedfile, and creating an image of said selected file, and said fileprocessing further including checking for encryption of one or morefiles, and decrypting said one or more files if encrypted; andoutputting the output file in the user-specified export format for atleast one of exporting, reviewing and searching the output file in anexternal system, wherein said steps of physically loading and copying,inputting, processing and converting of the file processing, andoutputting being performed by said processor-based device at saiduser-site.
 2. The computer readable storage medium of claim 1, whereinsaid converting step includes identifying a file that is not convertibleas an exception file.
 3. The computer readable storage medium of claim1, further comprising a step of: displaying summary statistics prior tosaid outputting step.
 4. The computer readable storage medium of claim3, wherein said step of displaying comprises: displaying a series ofparent-child relationships between a file and an attachment to said filesaid parent-child relationship being a document context.
 5. The computerreadable storage medium of claim 4, wherein at least said convertingstep is a step performed in a software plug-in module.
 6. The computerreadable storage medium of claim 1, wherein said metadata comprises:date created, last date opened, last date modified, creator name, mattername, and predetermined identification and quality control data.
 7. Thecomputer readable storage medium of claim 1, wherein said step ofconverting further comprises: marking a file that cannot be converted asan exception file.
 8. The computer readable storage medium of claim 1,wherein said working copy comprises: a time-stamped audit fileconfigured to record a file history spanning file creation to filedestruction.
 9. The computer readable storage medium of claim 1, furthercomprising: performing page counts and time stamping or digitalauthentication of said output file.
 10. The computer readable storagemedium of claim 7, wherein said file that cannot be converted comprisesone of: a file with a virus; an encrypted file; a corrupted file; anunknown file-type; and a deselected file.
 11. The computer readablestorage medium of claim 7, wherein said step of marking a file thatcannot be converted comprises one of: logging said exception file; andexporting said exception file.
 12. The computer readable storage mediumof claim 1, wherein said step of converting comprises one of: timestamping and digitally authenticating both an image and a file ofextracted meta-data.
 13. The computer readable storage medium of claim1, wherein said step of converting comprises one of: time stamping ordigitally authenticating the output file; and appending selectedmeta-data.
 14. The computer readable storage medium of claim 1, whereinsaid step of creating comprises: creating one or more searchablesubordinate text files containing the contents of an operator-selectedsubset of the selected files; time stamping or digitally authenticatingthe one or more subordinate text files; and appending selected meta-dataabout the files included in the subordinate text files.
 15. The computerreadable storage medium of claim 1, further comprising: extracting oneof file content data, content header information, file meta-data, filetype information, and file characteristic data.
 16. The computerreadable storage medium of claim 1, wherein said step of extracting textcomprises: searching for a keyword.
 17. The computer readable storagemedium of claim 1, wherein said step of extracting text comprises:extracting a portion of text around said keyword.
 18. The computerreadable storage medium of claim 1, wherein said step of processing andconverting comprises: processing with a prioritization scheme keyed tofile type.
 19. The computer readable storage medium of claim 18, whereinsaid step of processing with a prioritization scheme comprises:processing by one of file extension and file header with a plug-inmodule.
 20. The computer readable storage medium of claim 19, whereinsaid step of processing with a plug-in module comprises: processing witha plug-in module configured to be selected to be ‘ON’ or ‘OFF’.